Geospatial Data can be defined as any data that is linked to a specific location on the surface of the Earth.
A lot of geospatial data is publicly available online.
sf}The sf package provides simple features access for R. Simple features is a common storage and access model of mostly two-dimensional geometries (like points, lines, and polygons) used by geographic information systems.
You can convert data associated with a specific latitude and longitude to an sf object, which is similar to a data frame, but with extra information about the geometry.
raw_data <- read_csv("simulated_case_locations.csv")
library(sf)
d_events <- st_as_sf(raw_data, coords = c('X', 'Y')) %>%
st_set_crs(4326)
d_events
## Simple feature collection with 5227 features and 1 field
## geometry type: POINT
## dimension: XY
## bbox: xmin: -84.81942 ymin: 39.04028 xmax: -84.25902 ymax: 39.31012
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 5,227 x 2
## id geometry
## <dbl> <POINT [°]>
## 1 1 (-84.52842 39.10935)
## 2 2 (-84.5291 39.10753)
## 3 3 (-84.53005 39.10966)
## 4 4 (-84.52883 39.10792)
## 5 5 (-84.53021 39.10781)
## 6 6 (-84.53026 39.10672)
## 7 7 (-84.5315 39.10842)
## 8 8 (-84.52977 39.10774)
## 9 9 (-84.52776 39.11035)
## 10 10 (-84.52991 39.10767)
## # … with 5,217 more rows
tigris}The tigris package allows a user to directly download and use TIGER/Line shapefiles (including boundaries for states, counties, census tracts, etc.) from the U.S. Census Bureau in R.
For example, we can download the shapefile for the census tracts in Hamilton County, Ohio.
hamilton_tracts <- tigris::tracts(state = '39', county = '61')
We can perform a spatial join using st_join() from the sf package to overlay the points from our event data into their census tracts (i.e., “assign” each point to its census tract). Then we can calculate the number of events in each tract using group_by() and summarize() from the tidyverse.
d_events <- st_join(d_events, hamilton_tracts)
d_tracts <- d_events %>%
group_by(GEOID) %>%
summarize(n_events = n()) %>%
st_set_geometry(NULL)
## # A tibble: 222 x 2
## GEOID n_events
## <chr> <int>
## 1 39061000200 12
## 2 39061000700 3
## 3 39061000900 13
## 4 39061001000 9
## 5 39061001100 13
## 6 39061001600 11
## 7 39061001700 16
## 8 39061001800 6
## 9 39061001900 3
## 10 39061002000 1
## # … with 212 more rows
tidycensus}The tidycensus package interfaces with the US Census Bureau’s decennial Census and five-year American Community APIs and returns tidyverse-ready data frames.
For example, we can download the population under 18 of each census tract in Hamilton County.
d_pop <- tidycensus::get_acs(geography = 'tract',
variables = 'B09001_001E',
year = 2016,
state = 39, county = 61,
geometry = TRUE) %>%
select(GEOID, n_children = estimate)
d_pop
## Simple feature collection with 222 features and 2 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -84.82016 ymin: 39.02208 xmax: -84.25651 ymax: 39.31206
## epsg (SRID): 4269
## proj4string: +proj=longlat +datum=NAD83 +no_defs
## First 10 features:
## GEOID n_children geometry
## 1 39061000200 234 MULTIPOLYGON (((-84.53186 3...
## 2 39061000700 80 MULTIPOLYGON (((-84.51926 3...
## 3 39061000900 325 MULTIPOLYGON (((-84.52104 3...
## 4 39061001000 225 MULTIPOLYGON (((-84.51595 3...
## 5 39061001100 351 MULTIPOLYGON (((-84.5109 39...
## 6 39061001600 259 MULTIPOLYGON (((-84.52619 3...
## 7 39061001700 348 MULTIPOLYGON (((-84.51742 3...
## 8 39061001800 159 MULTIPOLYGON (((-84.51075 3...
## 9 39061001900 124 MULTIPOLYGON (((-84.50007 3...
## 10 39061002000 111 MULTIPOLYGON (((-84.4879 39...
We can use this information to calculate the event rate per 1000 children in each tract.
d <- left_join(d_pop, d_tracts, by = 'GEOID') %>%
mutate(event_rate = n_events / n_children * 1000)
d
## Simple feature collection with 222 features and 4 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -84.82016 ymin: 39.02208 xmax: -84.25651 ymax: 39.31206
## epsg (SRID): 4269
## proj4string: +proj=longlat +datum=NAD83 +no_defs
## First 10 features:
## GEOID n_children n_events geometry
## 1 39061000200 234 12 MULTIPOLYGON (((-84.53186 3...
## 2 39061000700 80 3 MULTIPOLYGON (((-84.51926 3...
## 3 39061000900 325 13 MULTIPOLYGON (((-84.52104 3...
## 4 39061001000 225 9 MULTIPOLYGON (((-84.51595 3...
## 5 39061001100 351 13 MULTIPOLYGON (((-84.5109 39...
## 6 39061001600 259 11 MULTIPOLYGON (((-84.52619 3...
## 7 39061001700 348 16 MULTIPOLYGON (((-84.51742 3...
## 8 39061001800 159 6 MULTIPOLYGON (((-84.51075 3...
## 9 39061001900 124 3 MULTIPOLYGON (((-84.50007 3...
## 10 39061002000 111 1 MULTIPOLYGON (((-84.4879 39...
## event_rate
## 1 51.282051
## 2 37.500000
## 3 40.000000
## 4 40.000000
## 5 37.037037
## 6 42.471042
## 7 45.977011
## 8 37.735849
## 9 24.193548
## 10 9.009009
There are many different ways to visualize geospatial data in R.
First, we could use the plotting features of base R.
plot(d['event_rate'])
ggplot2} and geom_sf()Using ggplot give us more flexibility and potential for customization, and is comfortable for those already familiar with ggplot functionality.
ggplot() +
geom_sf(data=d, aes(fill=event_rate)) +
scale_fill_viridis_c() +
labs(fill="Event Rate") +
ggthemes::theme_map() +
theme(legend.position = c(0.9, 0))
mapview}The mapview package is useful for quick and convenient interactive visualisations of spatial data. It was created to fill the gap of quick (not presentation grade) interactive plotting to examine and visually investigate both aspects of spatial data, the geometries and their attributes.
mapview::mapview(d, zcol='event_rate')
tmap} : Static ModeThe tmap package allows the user to thematic maps with great flexibility. The syntax for creating plots is similar to that of ggplot2, but tailored to maps.
tmap has two modes. plot mode is useful for publication-ready static maps.
library(tmap)
tmap_mode(mode='plot')
tm1 <- tm_shape(d) +
tm_fill(col='event_rate', palette='viridis', title="Event Rate") +
tm_shape(d) +
tm_borders() +
tm_layout(legend.position = c(0.9, 0),
frame = FALSE) +
tm_scale_bar(position = c(0,0)) +
tm_compass(position = c(0.05, 0.1), size = 3, lwd=1)
tmap} : Interactive ModeWe can change tmap_mode to view and print interactively view the static map we created in the previous step. This is similar to mapview but has more customizability.
tmap_mode(mode='view')
tmap}tmap_mode("plot")
tm2 <- tm_shape(d) +
tm_polygons(c('event_rate', 'dep_index'),
palette = list('Blues', 'Reds'),
title=list("Event Rate", "Deprivation Index"),
alpha=0.8) +
tm_facets(sync = TRUE, ncol=1) +
tm_layout(legend.position = c(0.9, 0),
frame = FALSE) +
tm_scale_bar(position = c(0,0)) +
tm_compass(position = c(0.05, 0.1), size = 3, lwd=1)
tm2
tmap_mode("view")
tm2
biscale}The biscale package uses features of ggplot2 to create bivariate choropleth maps.
library(biscale)
d_biscale <- bi_class(d, x = dep_index, y = event_rate, style = "quantile", dim = 3)
map <- ggplot() +
geom_sf(data = d_biscale, mapping = aes(fill = bi_class), color = "white", size = 0.1, show.legend = FALSE) +
bi_scale_fill(pal = "DkViolet", dim = 3) +
bi_theme() +
theme(title = element_text(size=15))
legend <- bi_legend(pal = "DkViolet",
dim = 3,
xlab = "Higher Deprivation",
ylab = "Higher Event Rate ",
size = 8)
library(cowplot)
b1 <- ggdraw() +
draw_plot(map, 0, 0, 1, 1) +
draw_plot(legend, 0, 0, 0.2, 0.5, vjust=0.1, hjust=-0.5)